

Clock Icon2024.02.05


はじめまして クラスメソッド株式会社 新規事業部のレオナです。

クラスメソッド株式会社では、社内情報の検索と回答の精度向上のために、RAG(Retrieval-Augmented Generation)を用いたQAチャットボットを運用と検証しています。このシステムは、ユーザーからの質問に対して関連する社内文書を検索し、LLMがそれらの情報を基に回答を生成します。しかし、運用の中で、ユーザーが常に必要とする情報を得られないという問題があります。





使用するLLMはOpenAIのText-Embedding-Ada-002とAWS BedrockのAmazon.Titan-Embed-Text-v1を使用しています。LLMモデルを使用するにあたって、OpenAIのAPIとAWS Bedrockのモデルの有効化が必要になります。AWS Bedrockの有効化については詳しくはこちらをご覧ください。

Amazon Bedrock をマネジメントコンソールからちょっと触ってみたいときは Base Models(基盤モデル)へのアクセスを設定しましょう

検証では、まず具体的なクエリを定義し、それに関連する文書を用意する必要があります。今回はOpenAIのサンプルコードで書かれている、arxiv(アーカイブ)という査読前論文投稿サイトが提供しているarxiv APIを用いてクエリ検索を行い、論文のAbstractを文書として扱います。



# pipインストールでarxiv APIが使えるようになります。
pip install arxiv
import arxiv
query = "how do bi-encoders work for sentence embeddings"
client_arxiv = arxiv.Client()
search = arxiv.Search(
query=query, max_results=20, sort_by=arxiv.SortCriterion.Relevance


1: A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation
2: SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features
3: Are Classes Clusters?
4: Semantic Composition in Visually Grounded Language Models
5: Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions
6: Learning Probabilistic Sentence Representations from Paraphrases
7: Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings
8: How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation
9: Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences
10: Vec2Sent: Probing Sentence Embeddings with Natural Language Generation
11: Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings
12: SentPWNet: A Unified Sentence Pair Weighting Network for Task-specific Sentence Embedding
13: Learning Joint Representations of Videos and Sentences with Web Image Search
14: Character-based Neural Networks for Sentence Pair Modeling
15: Train Once, Test Anywhere: Zero-Shot Learning for Text Classification
16: Efficient Domain Adaptation of Sentence Embeddings Using Adapters
17: Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models
18: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
19: In Search for Linear Relations in Sentence Embedding Spaces
20: Learning to Borrow -- Relation Representation for Without-Mention Entity-Pairs for Knowledge Graph Completion



arxiv オリジナル AWS Bedrock Amazon.Titan-Embed-Text-v1 OpenAI Text-Embedding-Ada-002
A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation Vec2Sent: Probing Sentence Embeddings with Natural Language Generation
SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models Are Classes Clusters?
Are Classes Clusters? In Search for Linear Relations in Sentence Embedding Spaces Semantic Composition in Visually Grounded Language Models
Semantic Composition in Visually Grounded Language Models Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions Are Classes Clusters? How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation
Learning Probabilistic Sentence Representations from Paraphrases Are Classes Clusters? SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features
Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings
How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features Train Once, Test Anywhere: Zero-Shot Learning for Text Classification
Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences
Vec2Sent: Probing Sentence Embeddings with Natural Language Generation Vec2Sent: Probing Sentence Embeddings with Natural Language Generation A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation
Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings Efficient Domain Adaptation of Sentence Embeddings Using Adapters
SentPWNet: A Unified Sentence Pair Weighting Network for Task-specific Sentence Embedding Semantic Composition in Visually Grounded Language Models Learning Probabilistic Sentence Representations from Paraphrases
Learning Joint Representations of Videos and Sentences with Web Image Search Semantic Composition in Visually Grounded Language Models Learning to Borrow -- Relation Representation for Without-Mention Entity-Pairs for Knowledge Graph Completion
Character-based Neural Networks for Sentence Pair Modeling Efficient Domain Adaptation of Sentence Embeddings Using Adapters Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings
Train Once, Test Anywhere: Zero-Shot Learning for Text Classification Efficient Domain Adaptation of Sentence Embeddings Using Adapters In Search for Linear Relations in Sentence Embedding Spaces
Efficient Domain Adaptation of Sentence Embeddings Using Adapters Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences Character-based Neural Networks for Sentence Pair Modeling
Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models Learning Probabilistic Sentence Representations from Paraphrases SentPWNet: A Unified Sentence Pair Weighting Network for Task-specific Sentence Embedding
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions
In Search for Linear Relations in Sentence Embedding Spaces Learning Joint Representations of Videos and Sentences with Web Image Search Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models
Learning to Borrow -- Relation Representation for Without-Mention Entity-Pairs for Knowledge Graph Completion How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation Learning Joint Representations of Videos and Sentences with Web Image Search




# BedrockとBedrockEmbeddingをllama_indexからインポートします
from llama_index.llms import Bedrock
from llama_index.embeddings import BedrockEmbedding

# Titanモデルのパラメータを設定します
model_kwargs_titan = {
"stopSequences": [],

# Bedrockのインスタンスを作成します
llm = Bedrock(
model="amazon.titan-text-express-v1", # amazon.titan-tg1-largeから変更

# BedrockEmbeddingのインスタンスを作成します
embed_model = BedrockEmbedding().from_credentials(
model_name='amazon.titan-embed-text-v1' # amazon.titan-embed-g1-text-02から変更

# チャンクのオーバーラップを設定します
chunk_overlap = 20
# チャンクのサイズを設定します
chunk_size = 512
# サービスコンテキストを設定します
service_context = ServiceContext.from_defaults(llm=llm,
# グローバルサービスコンテキストを設定します


Chunking=512 Chunking=2048
A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models In Search for Linear Relations in Sentence Embedding Spaces
In Search for Linear Relations in Sentence Embedding Spaces Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models Are Classes Clusters?
Are Classes Clusters? Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings
Are Classes Clusters? SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features
SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features Vec2Sent: Probing Sentence Embeddings with Natural Language Generation
SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features Semantic Composition in Visually Grounded Language Models
Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings Efficient Domain Adaptation of Sentence Embeddings Using Adapters
Vec2Sent: Probing Sentence Embeddings with Natural Language Generation Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences
Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings Learning Probabilistic Sentence Representations from Paraphrases
Semantic Composition in Visually Grounded Language Models Learning Joint Representations of Videos and Sentences with Web Image Search
Semantic Composition in Visually Grounded Language Models How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation
Efficient Domain Adaptation of Sentence Embeddings Using Adapters Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings
Efficient Domain Adaptation of Sentence Embeddings Using Adapters Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models
Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences SentPWNet: A Unified Sentence Pair Weighting Network for Task-specific Sentence Embedding
Learning Probabilistic Sentence Representations from Paraphrases Character-based Neural Networks for Sentence Pair Modeling
Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models Train Once, Test Anywhere: Zero-Shot Learning for Text Classification
Learning Joint Representations of Videos and Sentences with Web Image Search Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions
How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation Learning to Borrow -- Relation Representation for Without-Mention Entity-Pairs for Knowledge Graph Completion


OpenAIのText-Embedding-Ada-002は”Vec2Sent: Probing Sentence Embeddings with Natural Language Generation”が一番類似性がありました。AWSのBedrock Amazon.Titan-Embed-Text-v1は10番目にランク付されていました。





  • 定量的に分析ができていないため評価指数を設定して、それを用いて分析する。
  • Reranking前と後でユーザーが必要な情報を得られたか、定性的に分析する。

Share this article

facebook logohatena logotwitter logo

© Classmethod, Inc. All rights reserved.